refactor(dbt): state assessments rebuild by GabyRangelB · Pull Request #3553 · TEAMSchools/teamster

GabyRangelB · 2026-03-30T19:14:07Z

Pull Request

Summary & Motivation

When merged, this pull request will rebuild the NJ and FL state assessment reporting layer. The core motivation is to move proficiency banding, demographic labeling, and assessment metadata computation upstream into intermediate models — so that reporting models become thin joins rather than logic-heavy CTEs. It also introduces a new Google Sheets comparison demographics source, consolidates the demographic comps pipeline, and adds several correctness fixes discovered during validation.

`stg_google_sheets__state_test_comparison_demographics`

New staging model replacing `stg_google_sheets__state_test_comparison`
Adds `custom_rows` CTE to synthesize HS ALG01 totals (aggregated from HS_09 and HS_10 rows, which official comp sources do not provide as a combined HS total)
Computes `aligned_level_test_code` (differentiates ALG01_MS from ALG01_HS), `school_level`, `grade_range_band`, `discipline`, and `total_proficient_students` directly in staging

`int_pearson__all_assessments`

Pushes proficiency banding (`is_proficient`, `testperformancelevel`), aligned demographic labels (`aligned_aggregate_ethnicity`, `aligned_ml_status`, `aligned_iep_status`), and metadata (`results_type`, `district_state`, `admin`, `aligned_test_code`) upstream from the reporting CTE into the intermediate model
New columns: `is_504`, `is_approaching_int`, `is_below_int`

`int_pearson__student_list_report` (new)

New intermediate model wrapping the student list report staging
Adds `aligned_test_code`, `is_proficient_int`, `results_type = 'Preliminary'`, `admin`, `subject`, and `district_state`

`int_fldoe__all_assessments` (kipptaf)

Adds `results_type`, `district_state`, `admin`, `subject`, `illuminate_subject`, `fast_aggregated_proficiency`, and `is_proficient_int` as computed columns so the kipptaf model no longer depends on the kippmiami deployment providing them
Fixes `achievement_level_int` sourcing — was broken by a regex on the text field which nulled out FSA records; now correctly reads `performance_level` from the kippmiami source
Adds `is_approaching_int` and `is_below_int` for FL parity with NJ
Filters PM3 and PM1 sections to `assessment_name = 'FAST'` to prevent EOC and Science records from contaminating the FAST rollup

`int_tableau__state_assessments_demographic_comps`

Replaces GROUP BY CUBE (1,024 combos) with explicit GROUPING SETS (12 combos) for ~85× reduction in computed groups
Consolidates the demographic comps intermediate chain from 2 models + macro into 1 model
Fixes self-join bug that made `region_matched` / `region_outperformed` flags dead columns
Adds automatic prelim score gating: `valid_prelim_assessments` CTE auto-suppresses NJ student list data for any (academic_year, assessment_name) already present in `int_pearson__all_assessments` Spring — eliminating the need to manually comment/uncomment the prelim branch each cycle
Fixes gate join to use `admin` instead of `season` (matching the actual column name in `int_pearson__all_assessments`)

`rpt_tableau__state_assessments_dashboard`

Simplifies reporting layer: removes inline CASE blocks for proficiency banding, subject mapping, and demographic labeling — now reads upstream-computed columns
Adds same automatic prelim score gating as the demographic comps model
Fixes ALG01 state comps fan-out: `state_comps` CTE now groups by `school_level` instead of `aligned_level_test_code`, and the join adds `e.school_level = c.school_level` — ensures grade 8 students match MS comps and grade 9+ students match HS comps without duplicating rows

`rpt_tableau__state_assessments_dashboard_comps`

Updates column references to match renamed columns in the modified demographics staging model

`rpt_tableau__academic_goals_rollup`

Swaps `stg_fldoe__fast` ref to `int_fldoe__all_assessments` for FL assessments, consolidating the FAST data path and enabling `is_approaching_int` / `is_below_int` to flow through without inline computation

`base_powerschool__course_enrollments`

Adds `standardized_discipline` column

`int_extracts__student_enrollments_subjects`

Refactored to use upstream columns rather than recomputing subject mappings inline

New model: `int_extracts__student_enrollments_courses`

YMLs

New or updated YML for: `int_pearson__all_assessments`, `int_pearson__student_list_report`, `int_fldoe__all_assessments` (kipptaf), `stg_google_sheets__state_test_comparison_demographics`, `int_tableau__state_assessments_demographic_comps`
Added `aligned_gender` to `int_extracts__student_enrollments` YML
Uniqueness tests added to stg, int, and rpt models in the comps pipeline

AI Assistance

This PR was largely Claude Code-assisted under human direction. Human directed the feature requirements and data model decisions (which fields to add, how prelim gating should work, the school_level join approach for ALG01, the GROUPING SETS refactor). Claude Code handled implementation, debugging, and row-count validation against production BigQuery tables.

Self-review

General

Update due date and assignee on the TEAMster Asana Project
Review the Claude Code Review comment posted on this PR. Address valid feedback; dismiss false positives with a brief reply explaining why.

dbt

Include a `[model_name].yml` properties file for all new models — see dbt Conventions
Include (or update) an exposure for all models consumed by a dashboard, analysis, or application
Breaking change? `stg_google_sheets__state_test_comparison` is disabled in favor of the new `stg_google_sheets__state_test_comparison_demographics`. Confirm no remaining refs to the old model.

CI checks

Trunk — passes. If it fails, run `trunk check` and `trunk fmt` locally, fix any issues, and push again.
dbt Cloud — passes. If it fails: click Details on the check → expand Invoke `dbt build ...` → Debug Logs. Or tag Claude on the PR with the error details.
Dagster Cloud — passes or not triggered.

Troubleshooting

…c upstream - Add proficiency banding, subject mapping, and metadata columns to int_pearson__all_assessments, int_fldoe__all_assessments, and int_iready__diagnostic_results - Add new int_pearson__student_list_report intermediate model - Add stg_google_sheets__state_test_comparison_demographics; disable old stg_google_sheets__state_test_comparison - Add standardized_discipline to base_powerschool__course_enrollments - Add new int_extracts__student_enrollments_courses model - Refactor int_extracts__student_enrollments_subjects to use upstream columns - Simplify rpt_tableau__state_assessments_dashboard and _comps by replacing inline CASE blocks with upstream column references - Update int_tableau__state_assessments_demographic_comps lineage to use int_pearson__student_list_report Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…g CTE - Switch state_comps CTE to stg_google_sheets__state_test_comparison_demographics - Move results_type, admin, season, subject, test_code upstream to int models - Rename test_code to aligned_test_code in int_pearson__student_list_report - Add admin and subject aliases to int_fldoe__all_assessments - Swap stg_pearson__student_list_report ref to int_pearson__student_list_report Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…ch flags Replace GROUP BY CUBE (1,024 combos) with explicit GROUPING SETS (12 combos) for ~85x reduction in computed groups. Consolidate the demographic comps intermediate chain from 2 models + macro into 1 model. Push demographic labels, comparison_entity, and test_code-derived columns upstream into the intermediate to simplify the reporting layer. Fix self-join bug that made region_matched/ region_outperformed flags dead columns. Add uniqueness tests to stg, int, and rpt models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- dim_state_assessment_benchmarks.yml: keep expanded surrogate key description, use bare `- unique` test (inherits severity/store_failures from dbt_project.yml project-level defaults instead of repeating per-test) - base_powerschool__course_enrollments: keep both courses_credittype normalization (from main) and standardized_discipline (from this branch) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add results_type, district_state, admin, subject, illuminate_subject, and fast_aggregated_proficiency as computed columns in the kipptaf int_fldoe__all_assessments model instead of depending on the kippmiami upstream to provide them. This allows rpt_tableau__state_assessments_dashboard to build without waiting for a kippmiami deployment. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Refactor int_tableau__state_assessments_demographic_comps: replace self-contained assessment_scores CTE with three union branches (NJ official, NJ prelim, FL official); add test_code_metadata CTE from stg_google_sheets__state_test_comparison_demographics to replace inline school_level/grade_range_band/discipline CASE statements; fix aligned column references and unqualified ON clause columns - Add YML for int_fldoe__all_assessments (kipptaf): uniqueness test, model description, and full column definitions including new metadata columns (results_type, district_state, aligned_level_test_code, illuminate_subject, fast_aggregated_proficiency, is_proficient_int, admin, subject) - Add YML for int_pearson__all_assessments: uniqueness test, model description, full column definitions; fix stale columns (englishlearnerel, studentwithdisabilities removed; aligned_* demographic columns, is_proficient_int, season, admin added) - Add YML for int_pearson__student_list_report: uniqueness test on (source_relation, academic_year, administration, state_id, aligned_test_code), model description, full column definitions; fix missing trailing comma in SQL - Update stg_google_sheets__state_test_comparison_demographics YML: move data_tests before columns, remove redundant store_failures, add missing aligned_level_test_code column - Add aligned_gender to int_extracts__student_enrollments YML - Remove design spec (work complete) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

- Add prelim score gating to int_tableau__state_assessments_demographic_comps: new prelim_assessments and valid_prelim_assessments CTEs automatically exclude NJ student list data for any (academic_year, assessment_name) already present in int_pearson__all_assessments Spring, eliminating the need to manually comment/uncomment the prelim branch each cycle - Qualify all column references in final SELECT with scores alias (s.) to satisfy RF02 after test_code_metadata join was introduced - Replace rolling 7-year window with fixed 2018 floor across all three score branches — 2018 is the earliest year with available comps data - Fix rpt_tableau__state_assessments_dashboard: inline-alias administration_window and assessment_subject from int_fldoe__all_assessments instead of expecting pre-computed admin/subject columns Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

- Strip stg_google_sheets__state_test_comparison_demographics to SELECT * so all columns (school_level, grade_range_band, discipline, aligned_level_test_code, etc.) come directly from the _v2 sheet range - Update sources-external.yml with _v2 sheet range Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…hics rename - Rename test_name → assessment_name, test_code → aligned_test_code in dim_state_assessment_benchmarks (SQL + contract YML) and state_comps CTE in rpt_tableau__state_assessments_dashboard - Rename comparison_demographic_group/subgroup_aligned → aligned_comparison_demographic_group/subgroup across all three downstream models - Fix join conditions in rpt_tableau__state_assessments_dashboard against state_comps CTE (all three score sections) - Add school_level to uniqueness test on stg_google_sheets__state_test_comparison_demographics — differentiates MS_HS source rows from synthetic HS ALG01 aggregate rows Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…raphics - Add custom_rows CTE to union source data with synthetic HS ALG01 totals (aggregated from HS_09 and HS_10 rows, which official comp sources do not provide as a combined HS total) - Rename derived columns: comparison_demographic_group/subgroup_aligned → aligned_comparison_demographic_group/subgroup - Update aligned_level_test_code derivation to use aligned_test_code instead of test_code Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

- Keep SELECT * from transformations CTE (main's explicit list was missing race_ethnicity computed in transformations) - Add testscorecomplete to union_relations include list, transformations CTE, and YML (new column introduced in main) - Keep full column descriptions from branch; reject main's stripped YML Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-09T20:03:43Z

Claude finished @cbini's task in 6m 26s —— View job

PR Review: refactor(dbt): state assessments rebuild

Gather context and understand PR scope
Read changed files and analyze code
Review dbt models and SQL transformations
Post comprehensive review feedback

This is a well-reasoned rebuild overall. The GROUPING SETS refactor is sound (~85x reduction vs. CUBE is real), the valid_prelim_assessments self-deactivating gate is a clever design, and the self-join fix for rpt_tableau__state_assessments_dashboard_comps is real and meaningful. Issues below are organized by severity.

Critical

1. Broken regexp_extract produces wrong aligned_test_code for grades 3-9

src/dbt/kipptaf/models/pearson/intermediate/int_pearson__student_list_report.sql lines 47-50:

when test_name like '%Mathematics%'
then concat('MAT', regexp_extract(test_name, r'.{6}(.{2})'))
when test_name like '%ELA%'
then concat('ELA', regexp_extract(test_name, r'.{6}(.{2})'))

r'.{6}(.{2})' extracts characters 7-8 of the test name. For "Grade 03 Mathematics" (double-digit), chars 7-8 = "03" → MAT03 ✓. For "Grade 3 Mathematics" (single-digit), chars 7-8 = " M" → MAT M ✗. Grades 3-9 (the majority of state assessment grades) would produce wrong codes. The correct regex is r'Grade (\d+)'.

When aligned_test_code is wrong or NULL (from the missing else), downstream joins to test_code_metadata and valid_prelim_assessments fail silently — rows still count in aggregates but with NULL school_level/grade_range_band/discipline.

Important

2. Dead NJ prelim UNION ALL branch in rpt_tableau__state_assessments_dashboard.sql

Lines 476-591 filter a.results_type = 'Preliminary' in the JOIN. But the assessment_scores CTE (lines 163-234) has its prelim section commented out (/* disabled until next december */), so no rows carry results_type = 'Preliminary'. This branch produces zero rows every run. The current asymmetry — one block commented out, the other live — is the actual bug. Either comment out the UNION ALL branch with the same annotation, or re-enable both together.

3. int_pearson__all_assessments contract enforcement disabled with no tracking

The PR body acknowledges this, but the YML contains no TODO, issue link, or comment. The model is a direct parent of extracts/ models (which have enforced contracts). At minimum, a # TODO: re-enable contract enforcement — see #<issue> comment should be added to the properties file before merge.

4. Race/ethnicity source asymmetry across union branches

int_tableau__state_assessments_demographic_comps.sql:

NJ official: uses a.aligned_aggregate_ethnicity — race from Pearson assessment file
NJ prelim: uses inline CASE on e.race_ethnicity — race from PowerSchool enrollment
FL official: same inline CASE on e.race_ethnicity — race from enrollment

The inline CASE in prelim/FL also lacks an else clause, so any unrecognized race_ethnicity code returns NULL instead of a labeled bucket. Cross-year comparisons for a student can land them in different demographic buckets depending on which branch processes them.

5. valid_prelim_assessments gate relies on unvalidated string match

Lines 42-52: pa.test_type = p.assessment_name must match exactly for the gate to work (e.g., Dagster partition key 'NJSLA' must equal the hardcoded assessment_name = 'NJSLA' in official records). If the naming diverges, prelim data silently flows through even when official scores exist. Needs either an accepted_values test on test_type in the staging model or a comment documenting the coupling.

6. YML: Missing description: on model and all columns — three files

Per src/dbt/CLAUDE.md: "All new or modified models require description: on the model and every column":

int_tableau__state_assessments_demographic_comps.yml — no model description, no column descriptions
rpt_tableau__state_assessments_dashboard_comps.yml — check for missing descriptions on modified columns
The intermediate YML is missing descriptions entirely

7. YML: store_failures: true on individual tests — two files

Per convention: "Do not add store_failures: true to individual tests — the project default handles it."

int_tableau__state_assessments_demographic_comps.yml line 47
rpt_tableau__state_assessments_dashboard_comps.yml — same pattern

Both instances should be removed.

Minor

8. ALG01 HS aggregation sums percentages instead of weighted average (pre-existing, now load-bearing)

stg_google_sheets__state_test_comparison_demographics.sql lines 41-43:

sum(percent_proficient) as percent_proficient,
sum(total_students) as total_students,

This sums two percentage values (HS_09 + HS_10) rather than computing a weighted average. The outer SELECT then multiplies percent_proficient * total_students to get total_proficient_students — doubly wrong. Pre-existing, but now directly consumed by the refactored comps pipeline. Should be captured in a tracked issue.

9. Region self-join region_matched = true — undocumented tautology

rpt_tableau__state_assessments_dashboard_comps.sql lines 137-149: When a.comparison_entity = 'Region', the self-join b.comparison_entity = 'Region' matches the same row, so region_matched = true always. Likely intentional but needs a comment to prevent future confusion.

Summary

#	File	Category
1	`int_pearson__student_list_report.sql` — broken regex for grades 3-9	Critical
2	`rpt_tableau__state_assessments_dashboard.sql` — dead UNION ALL branch	Important
3	`int_pearson__all_assessments.yml` — contract enforcement disabled, no tracking	Important
4	`int_tableau__state_assessments_demographic_comps.sql` — race source asymmetry	Important
5	`int_tableau__state_assessments_demographic_comps.sql` — gate relies on unvalidated string match	Important
6	Three YML files — missing model/column descriptions	Important
7	Two YML files — `store_failures: true` on individual tests	Important
8	`stg_google_sheets__state_test_comparison_demographics.sql` — percent summing	Minor
9	`rpt_tableau__state_assessments_dashboard_comps.sql` — undocumented self-join tautology	Minor

Issues 1, 2, and 3 should be resolved before merge. Issues 4-7 should either be fixed or tracked in a follow-up issue with a reference added to the code.

…nt, fix prelim gating, fix FAST source and ALG01 state comps fan-out - int_pearson__all_assessments: add is_504, is_approaching_int, is_below_int columns - int_fldoe__all_assessments (ktaf): fix achievement_level_int via performance_level, add is_approaching_int/is_below_int; filter PM3/PM1 to assessment_name = 'FAST' to prevent EOC/Science contamination - rpt_tableau__academic_goals_rollup: swap stg_fldoe__fast ref to int_fldoe__all_assessments - rpt_tableau__state_assessments_dashboard: add prelim score gating (auto-suppresses when official scores exist); fix state_comps join to use school_level instead of aligned_level_test_code to resolve ALG01 MS/HS fan-out - int_tableau__state_assessments_demographic_comps: fix prelim gate join to use admin instead of season; fix same ALG01 fan-out via admin column - rpt_tableau__state_assessments_dashboard_comps: update column refs to match renamed demographics staging columns Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…; remove store_failures from individual tests - int_pearson__student_list_report: replace .{6}(.{2}) regex with Grade (\d+) + lpad — fixes wrong aligned_test_code for grades 3-9 where single-digit grade names caused positional extraction to capture a space character instead of the grade number - int_tableau__state_assessments_demographic_comps.yml, rpt_tableau__state_assessments_dashboard_comps.yml: remove store_failures: true from individual uniqueness tests — project default handles this Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

GabyRangelB · 2026-04-10T22:14:04Z

@claude - update your review

github-actions · 2026-04-10T22:14:20Z

Claude finished @GabyRangelB's task in 2m 54s —— View job

PR Review: refactor(dbt): state assessments rebuild

Gather context and understand PR scope
Read changed files and check prior review issues
Post updated review feedback

Updated review — all 9 prior issues re-checked against current branch.

Resolved since last review

~~1. Broken regexp_extract for grades 3-9~~ — Fixed ✓
int_pearson__student_list_report.sql lines 46–55 now uses r'Grade (\d+)' with lpad(..., 2, '0'). Single-digit grades (3–9) now produce correct codes (MAT03, ELA09, etc.).

~~2. Dead UNION ALL branch in dashboard~~ — Fixed ✓
The commented-out prelim section in assessment_scores has been removed. Both the prelim UNION ALL in the CTE and the inner join valid_prelim_assessments gate in the NJ prelim SELECT are now live and consistent.

~~3. Contract enforcement disabled with no tracking~~ — Moot ✓
int_pearson__all_assessments is an intermediate model. Per src/dbt/CLAUDE.md, only rpt_, dim_*, fct_*, and staging models require contract: enforced: true. No tracking needed.

~~7. store_failures: true on individual tests~~ — Fixed ✓
Removed from both int_tableau__state_assessments_demographic_comps.yml and rpt_tableau__state_assessments_dashboard_comps.yml.

Still open

4. Race/ethnicity source asymmetry across union branches — Still present

int_tableau__state_assessments_demographic_comps.sql:

NJ official (lines 82–83): reads a.aligned_aggregate_ethnicity — from the Pearson assessment file
NJ prelim (lines 122–138) and FL official (lines 190–207): inline CASE on e.race_ethnicity — from PowerSchool enrollment

The inline CASE handles null → 'Blank' but has no else clause — any non-null, unrecognized code returns NULL, not 'Blank'. Cross-year comparisons can put the same student in different demographic buckets depending on which branch processes them.

5. valid_prelim_assessments gate relies on unvalidated string match — Still present

Both rpt_tableau__state_assessments_dashboard.sql (line 34) and int_tableau__state_assessments_demographic_comps.sql (line 50):

and pa.test_type = p.assessment_name

If the Dagster test_type partition key diverges from the assessment_name string in official records (e.g. 'NJSLA' vs 'NJSLA-S'), prelim data silently flows through even when official scores exist. Needs either an accepted_values test on test_type in the staging model or a comment documenting the coupling.

6. Missing description: on model and all columns — two YML files — Partially fixed; two files remain

Per src/dbt/CLAUDE.md: "All new or modified models require description: on the model and every column."

int_pearson__student_list_report.yml — now has full descriptions ✓
int_tableau__state_assessments_demographic_comps.yml — no model description, no column descriptions ✗
rpt_tableau__state_assessments_dashboard_comps.yml — no model description, no column descriptions ✗

Minor (pre-existing, tracked)

8. ALG01 HS aggregation sums percentages instead of weighted average — Still present

stg_google_sheets__state_test_comparison_demographics.sql lines 55–56:

sum(percent_proficient) as percent_proficient,
sum(total_students) as total_students,

percent_proficient is the sum of HS_09 + HS_10 percentages, not a weighted average. The outer SELECT at line 86 then multiplies percent_proficient * total_students to compute total_proficient_students, compounding the error. Pre-existing, but now load-bearing. Should be tracked in an issue if not already.

9. Region self-join region_matched = true — undocumented tautology — Still present

rpt_tableau__state_assessments_dashboard_comps.sql lines 137–149: When a.comparison_entity = 'Region', b (filtered to comparison_entity = 'Region') matches the same row, so region_matched is always true for Region rows. Likely intentional but should have a comment to prevent future confusion.

Summary

#	Issue	Status
1	`int_pearson__student_list_report.sql` — broken regex	Fixed ✓
2	`rpt_tableau__state_assessments_dashboard.sql` — dead UNION ALL branch	Fixed ✓
3	`int_pearson__all_assessments.yml` — contract enforcement concern	Moot ✓
4	`int_tableau__state_assessments_demographic_comps.sql` — race source asymmetry	Still open
5	`int_tableau__state_assessments_demographic_comps.sql` — unvalidated gate string match	Still open
6	Two YML files — missing model/column descriptions	Partially fixed (2 of 3 remain)
7	Two YML files — `store_failures: true` on individual tests	Fixed ✓
8	`stg_google_sheets__state_test_comparison_demographics.sql` — percent summing	Still present (minor/pre-existing)
9	`rpt_tableau__state_assessments_dashboard_comps.sql` — undocumented tautology	Still present (minor)

Issues 4–6 should be resolved or tracked before merge. Issues 8–9 are minor and pre-existing.

…/delete conflict Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

…from int_pearson and int_fldoe Replace inline subject alignment, test code normalization, proficiency flag computations, and period mapping in downstream consumers of int_pearson__all_assessments and int_fldoe__all_assessments with the pre-computed columns now available on both models. Also adds model + column descriptions to int_tableau__state_assessments_demographic_comps. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

GabyRangelB · 2026-04-17T15:06:56Z

Downstream model changes — notice to model owners

The new columns added to int_pearson__all_assessments and int_fldoe__all_assessments made redundant several inline transformation expressions in downstream models. Those models were updated in this PR to remove the duplication. If you own one of the models below, please review the change summary.

Each change was validated by comparing output row counts and metric values against production BigQuery before and after — zero data differences in all cases except where noted.

@awalters — Anthony Walters

`rpt_tableau__academic_goals_rollup`

What changed: Three branches of state_test_union were computing a subject remap inline. All three now use the pre-computed illuminate_subject column:

Removed	Replaced with
`case when subject like 'English%' then 'Reading' ...` (NJ)	`if(illuminate_subject = 'Text Study', 'Reading', 'Math')`
`if(assessment_subject = 'English Language Arts', 'Reading', 'Math')` (FL ×2)	`if(illuminate_subject = 'Text Study', 'Reading', 'Math')`

Validation: Math/Reading row counts unchanged. 16-row total diff predates this change (FL/iReady/Star upstream data).

Next steps: is_proficient_int, is_approaching_int, is_below_int already read directly from upstream — no further action needed.

`rpt_tableau__miami_fast`

What changed: is_proficient was a FLOAT64 computed inline. Now reads is_proficient_int (INT64) directly; contract YAML updated to match:

Removed	Replaced with
`case ft.is_proficient when true then 1.0 when false then 0.0 end as is_proficient`	`ft.is_proficient_int as is_proficient`
`data_type: float64` in YML	`data_type: int64`

Validation: Zero diff at 39,427 rows. Tableau treats INT64 and FLOAT64 identically for numeric fields.

Next steps: The remaining if(... and fl.is_proficient ...) boolean checks in the disparity/growth CTEs cannot be replaced with is_proficient_int directly, but fast_aggregated_proficiency, sublevel_number, scale_for_proficiency, and points_to_proficiency are now all pre-computed on int_fldoe__all_assessments and may simplify those CTEs further.

`rpt_gsheets__kippmiami_payout_roster` (currently disabled)

What changed: Two aggregation expressions recomputed is_proficient_int inline:

Removed	Replaced with
`round(avg(if(is_proficient, 1, 0)), 2)` ×2	`round(avg(is_proficient_int), 2)` ×2

Validation: Pre-change query confirmed equivalence for all groups — zero mismatches.

Next steps when re-enabling: fast_aggregated_proficiency, sublevel_number, scale_for_proficiency, points_to_proficiency, scale_for_growth, and points_to_growth are now pre-computed on int_fldoe__all_assessments and may simplify the growth and disparity CTEs.

@cbini — Charlie Bini

`int_topline__state_assessments_weekly`

What changed: Both branches computed is_proficient_int inline. Both now read the column directly:

Removed	Replaced with
`case when fl.is_proficient then 1 when not fl.is_proficient then 0 end` (FL)	`fl.is_proficient_int`
`case when p.is_proficient then 1 when not p.is_proficient then 0 end` (NJ)	`p.is_proficient_int`

Validation: Proficient / not-proficient / total counts by region (Camden, Miami, Newark) — zero diff.

`dim_assessments`

What changed: The state_nj CTE re-derived subject area and test code inline:

Removed	Replaced with
`if(subject = 'English Language Arts/Literacy', 'English Language Arts', subject)`	`aligned_subject`
`case testcode when 'SC05' then 'SCI05' ... end`	`aligned_test_code`

Validation: Zero diff across all 24 subject/module combinations.

`fct_assessment_scores_enrollment_scoped`

What changed: The state_nj CTE computed subject alignment, test code normalization, and period mapping inline:

Removed	Replaced with
`if(subject = 'English Language Arts/Literacy', 'English Language Arts', subject)`	`aligned_subject`
`case testcode when 'SC05' then 'SCI05' ... end`	`aligned_test_code`
`if(period = 'FallBlock', 'Fall', period)` ×2	`admin`

Validation: Zero diff across all 23 subject/module combinations. Surrogate key stability confirmed (administration_window values unchanged).

Next steps: The state_fl CTE could be similarly simplified — int_fldoe__all_assessments now exposes is_proficient_int, is_approaching_int, is_below_int, and illuminate_subject directly.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

GabyRangelB · 2026-04-17T15:49:45Z

@claude please update your review

github-actions · 2026-04-17T15:50:00Z

Claude finished @GabyRangelB's task in 1m 51s —— View job

PR Review: refactor(dbt): state assessments rebuild

Gather context and understand PR scope
Read changed files and check prior review issues
Post updated review feedback

Updated review — all 9 prior issues re-checked against current branch.

Resolved since last review

~~4. Race/ethnicity source asymmetry — ml_status leg~~ — Partially fixed ✓
The NJ prelim branch now reads e.ml_status directly (line 119) rather than recomputing it inline, closing the aligned_ml_status vs inline-CASE asymmetry for that field. The aggregate_ethnicity asymmetry and the missing else described below remain.

~~6. Missing model description — int_tableau__state_assessments_demographic_comps.yml~~ — Fixed ✓
Model description and all column descriptions are now present. Only one file remains.

Still open

4. Race/ethnicity source asymmetry and missing else clause — Still present

int_tableau__state_assessments_demographic_comps.sql:

NJ official (line 82): reads a.aligned_aggregate_ethnicity — from the Pearson assessment file
NJ prelim (lines 122–139) and FL official (lines 190–207): inline CASE on e.race_ethnicity — from PowerSchool enrollment

Both inline CASE blocks end with when e.race_ethnicity is null then 'Blank' but have no else clause — any non-null, unrecognized code returns NULL instead of 'Blank'. Cross-year comparisons can land the same student in different demographic buckets depending on which branch processes them.

Fix this →

5. valid_prelim_assessments gate relies on unvalidated string match — Still present

Both rpt_tableau__state_assessments_dashboard.sql (line 34) and int_tableau__state_assessments_demographic_comps.sql (line 50):

and pa.test_type = p.assessment_name

If the Dagster test_type partition key diverges from the assessment_name string in official records, prelim data silently flows through even when official scores exist. Needs either an accepted_values test on test_type in the staging model or a comment documenting the expected coupling.

6. Missing model and column descriptions — rpt_tableau__state_assessments_dashboard_comps.yml — Still open

Per src/dbt/CLAUDE.md: "All new or modified models require description: on the model and every column." The file has only data_type: entries — no description: on the model or any of its 17 columns.

Fix this →

Minor (pre-existing, tracked)

8. ALG01 HS aggregation sums percentages instead of weighted average — Still present

stg_google_sheets__state_test_comparison_demographics.sql line 57:

sum(percent_proficient) as percent_proficient,

This sums HS_09 + HS_10 percentages rather than computing a weighted average. The outer SELECT at line 89 then multiplies percent_proficient * total_students to compute total_proficient_students, compounding the error. Pre-existing and now load-bearing. Should be tracked in an issue if not already.

9. Region self-join region_matched = true — undocumented tautology — Still present

rpt_tableau__state_assessments_dashboard_comps.sql lines 132–151: When a.comparison_entity = 'Region', the join b.comparison_entity = 'Region' matches the same row, so region_matched is always true for Region rows. Likely intentional but should have a comment to prevent future confusion.

Summary

#	Issue	Status
1	`int_pearson__student_list_report.sql` — broken regex	Fixed ✓
2	`rpt_tableau__state_assessments_dashboard.sql` — dead UNION ALL branch	Fixed ✓
3	`int_pearson__all_assessments.yml` — contract enforcement concern	Moot ✓
4	`int_tableau__state_assessments_demographic_comps.sql` — race source asymmetry + missing `else`	Partially fixed (ml_status ✓; aggregate_ethnicity still open)
5	Two models — unvalidated gate string match	Still open
6	`rpt_tableau__state_assessments_dashboard_comps.yml` — missing descriptions	Still open (1 of 3 fixed ✓)
7	Two YML files — `store_failures: true` on individual tests	Fixed ✓
8	`stg_google_sheets__state_test_comparison_demographics.sql` — percent summing	Still present (minor/pre-existing)
9	`rpt_tableau__state_assessments_dashboard_comps.sql` — undocumented tautology	Still present (minor)

Issues 4 (missing else), 5, and 6 should be resolved or tracked before merge.

…est_type coupling comment, and comps yml descriptions Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

GabyRangelB · 2026-04-17T18:11:25Z

@claude please update your review

github-actions · 2026-04-17T18:11:39Z

Claude finished @GabyRangelB's task in 2m 23s —— View job

PR Review: refactor(dbt): state assessments rebuild

Gather context and understand PR scope
Read changed files and check prior review issues
Post updated review feedback

Updated review — all prior issues re-checked against current branch.

Resolved since last review

~~6. Missing model and column descriptions — rpt_tableau__state_assessments_dashboard_comps.yml~~ — Fixed ✓

Full model description and all 16 column descriptions are now present. All three YML files from the original review now have complete descriptions.

Still open

4. Race/ethnicity source asymmetry — missing else clause — Partially fixed

The accepted_values warn test on comparison_demographic_subgroup in int_tableau__state_assessments_demographic_comps.yml (lines 69–83) is a good improvement — it will alert when an unexpected race code produces NULL rather than 'Blank'. However, the underlying CASE blocks in the SQL still lack an else clause, so any non-null, unrecognized race_ethnicity code still silently produces NULL in both the NJ prelim branch (lines 122–139) and FL official branch (lines 190–207). The accepted_values test will warn, but NULL rows will still appear in aggregates until someone investigates. Adding else 'Blank' closes this gap permanently.

Fix this →

5. valid_prelim_assessments gate coupling comment — Partially fixed

int_tableau__state_assessments_demographic_comps.sql now has the comment at line 50 (-- test_type in prelim data matches assessment_name in official records) ✓. However, rpt_tableau__state_assessments_dashboard.sql line 34 has the identical pa.test_type = p.assessment_name join condition with no comment. For consistency and to prevent the same debugging confusion in the other model, that line should also carry the coupling comment.

Fix this →

Minor (pre-existing, tracked)

8. ALG01 HS aggregation sums percentages instead of weighted average — Still present

stg_google_sheets__state_test_comparison_demographics.sql line 57:

sum(percent_proficient) as percent_proficient,

This sums HS_09 + HS_10 percentages rather than a weighted average. The outer SELECT at line 89 then multiplies percent_proficient * total_students to compute total_proficient_students, compounding the error. Pre-existing and load-bearing; should be tracked in an issue if not already.

9. Region self-join region_matched = true — undocumented tautology — Still present

rpt_tableau__state_assessments_dashboard_comps.sql lines 139–151: When a.comparison_entity = 'Region', the join on b.comparison_entity = 'Region' matches the same row, making region_matched always true for Region rows. Likely intentional but a brief comment would prevent future confusion.

Summary

#	Issue	Status
1	`int_pearson__student_list_report.sql` — broken regex	Fixed ✓
2	`rpt_tableau__state_assessments_dashboard.sql` — dead UNION ALL branch	Fixed ✓
3	`int_pearson__all_assessments.yml` — contract enforcement concern	Moot ✓
4	`int_tableau__state_assessments_demographic_comps.sql` — missing `else 'Blank'`	Partially fixed (accepted_values warn test ✓; `else` clause still missing)
5	Gate coupling comment	Partially fixed (comps model ✓; dashboard SQL still missing comment)
6	`rpt_tableau__state_assessments_dashboard_comps.yml` — missing descriptions	Fixed ✓
7	Two YML files — `store_failures: true` on individual tests	Fixed ✓
8	`stg_google_sheets__state_test_comparison_demographics.sql` — percent summing	Still present (minor/pre-existing)
9	`rpt_tableau__state_assessments_dashboard_comps.sql` — undocumented tautology	Still present (minor)

Issues 4 and 5 are the only open blockers. Both are one-line fixes. Issues 8–9 are minor and pre-existing.
·

…sments_dashboard Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

GabyRangelB · 2026-04-17T18:44:07Z

Issue 5 fixed. On issue 4: we intentionally left out else 'Blank' — the accepted_values warn test is the desired behavior. If a new race_ethnicity code appears that isn't in the CASE, we want to know about it so we can explicitly add it to the mapping. Silently falling through to 'Blank' would hide the new code and we'd never know to handle it.

GabyRangelB and others added 12 commits March 26, 2026 19:05

yml-fmt

e7eab5a

cbini marked this pull request as ready for review April 9, 2026 20:03

cbini requested review from a team as code owners April 9, 2026 20:03

GabyRangelB and others added 9 commits April 10, 2026 14:02

forcing rebuild

c6b1471

stage

95b39f0

typo

430a64b

test

6243cf2

test again

98cbcc6

yay

d261a59

simplify demo gsheet

9a9a06f

GabyRangelB and others added 2 commits April 16, 2026 16:22

merge: sync with main, resolve dim_state_assessment_benchmarks modify…

f23d678

…/delete conflict Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

force-remove-row

4832edb

GabyRangelB added 4 commits April 16, 2026 19:07

trying to fix gh error

001df39

ci: retrigger CI

a4eee30

fix(dbt): add Paterson to NJ region unnest in state assessments comps

6db83bf

Merge branch 'main' into claude/feat/stat-rebuild

93788d9

GabyRangelB requested review from a team as code owners April 17, 2026 13:12

GabyRangelB and others added 2 commits April 17, 2026 13:16

chore: merge origin/main into claude/feat/stat-rebuild

1a2c29b

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

test(dbt): add uniqueness test to int_powerschool__ps_adaadm_daily_ctod

364e2ca

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

test(dbt): address PR review — ethnicity accepted_values warn test, t…

3e3388b

…est_type coupling comment, and comps yml descriptions Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

test(dbt): add test_type coupling comment to rpt_tableau__state_asses…

507c396

…sments_dashboard Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(dbt): state assessments rebuild#3553

refactor(dbt): state assessments rebuild#3553
GabyRangelB wants to merge 32 commits intomainfrom
claude/feat/stat-rebuild

GabyRangelB commented Mar 30, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

GabyRangelB commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

GabyRangelB commented Apr 17, 2026

Uh oh!

GabyRangelB commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

GabyRangelB commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026 •

edited

Loading

Uh oh!

GabyRangelB commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GabyRangelB commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request

Summary & Motivation

`stg_google_sheets__state_test_comparison_demographics`

`int_pearson__all_assessments`

`int_pearson__student_list_report` (new)

`int_fldoe__all_assessments` (kipptaf)

`int_tableau__state_assessments_demographic_comps`

`rpt_tableau__state_assessments_dashboard`

`rpt_tableau__state_assessments_dashboard_comps`

`rpt_tableau__academic_goals_rollup`

`base_powerschool__course_enrollments`

`int_extracts__student_enrollments_subjects`

New model: `int_extracts__student_enrollments_courses`

YMLs

AI Assistance

Self-review

General

dbt

CI checks

Troubleshooting

Uh oh!

github-actions bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: refactor(dbt): state assessments rebuild

Critical

Important

Minor

Summary

Uh oh!

GabyRangelB commented Apr 10, 2026

Uh oh!

github-actions bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: refactor(dbt): state assessments rebuild

Resolved since last review

Still open

Minor (pre-existing, tracked)

Summary

Uh oh!

GabyRangelB commented Apr 17, 2026

Downstream model changes — notice to model owners

@awalters — Anthony Walters

rpt_tableau__academic_goals_rollup

rpt_tableau__miami_fast

rpt_gsheets__kippmiami_payout_roster (currently disabled)

@cbini — Charlie Bini

int_topline__state_assessments_weekly

dim_assessments

fct_assessment_scores_enrollment_scoped

Uh oh!

GabyRangelB commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: refactor(dbt): state assessments rebuild

Resolved since last review

Still open

Minor (pre-existing, tracked)

Summary

Uh oh!

GabyRangelB commented Apr 17, 2026

Uh oh!

github-actions bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: refactor(dbt): state assessments rebuild

Resolved since last review

Still open

Minor (pre-existing, tracked)

Summary

Uh oh!

GabyRangelB commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

GabyRangelB commented Mar 30, 2026 •

edited

Loading

github-actions bot commented Apr 9, 2026 •

edited

Loading

github-actions bot commented Apr 10, 2026 •

edited

Loading

`rpt_tableau__academic_goals_rollup`

`rpt_tableau__miami_fast`

`rpt_gsheets__kippmiami_payout_roster` (currently disabled)

`int_topline__state_assessments_weekly`

`dim_assessments`

`fct_assessment_scores_enrollment_scoped`

github-actions bot commented Apr 17, 2026 •

edited

Loading

github-actions bot commented Apr 17, 2026 •

edited

Loading